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Abstract: We construct minimax optimal non-asymptotic confidence sets 
for low rank matrix recovery algorithms. These are employed to devise 
sequential sampling procedures that guarantee recovery of the true matrix 
in Frobenius norm after a data-driven stopping time n for the number of 
measurements that have to be taken. With high probability, this stopping 
time is minimax optimal. We detail applications to quantum tomography 
problems where measurements arise from Pauli observables. We also give 
a theoretical construction of a confidence set for the density matrix of 
a quantum state that has optimal diameter in nuclear norm. The non- 
asymptotic properties of our confidence sets are investigated in a simulation 
study. 
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1. Introduction 

Consider the high-dimensional matrix trace regression model 

Yi=tr{X l 6)+£i, i = 1,..., n, (1) 

where the £*’s are random noise variables, independent of the random design 
matrices X 1 , and where the matrix 6 is the object of inferential interest. We 
denote the law of (Y, A) given 6 by P g. To reflect the structure of the main 
application we have in mind - quantum tomography, introduced in detail below 
- we assume that X 1 and 9 are both d x d square matrices, and study the case 
where the number n of measurements taken may be smaller than the effective 
parameter dimension d 2 . Recovery of 6 in such situations is still possible by 
compressed sensing techniques [5, 22, 21, 29], under two main structural as¬ 
sumptions on the model: 1) the matrix 9 is of low rank and 2) the measurement 
matrices X 1 satisfy the restricted isometry (or a related coherence) property. In 
this case recovery of a rank k matrix 8 is possible in Frobenius distance || • ||f 
by, e.g., the Matrix Lasso 9 n : for any e > 0 and with high Pg-probability, 

|| 0 n — 9\\p < e as soon as n > kdlogd, 

1 
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where logd = log 1 * d, rj > 0, is the so-called ‘polylog’ function. The design used 
in quantum tomography is such that the X 1 are randomly drawn from a basis 
{Ei,... ,Ed 2 } of the space of d x d matrices, and one samples fewer than all 
d 2 basis coefficients tr(Eid) without losing recovery guarantees for low rank 
matrices 6. In experimental settings (e.g., [16]), d = 2 N where TV is a possibly 
large number of particles, but 9 will represent an approximately pure quantum 
state, motivating the low rank hypothesis and explaining the interest of quantum 
information theorists in dimension reduction methods (see the appendix and 
[14, 13, 25, 10, 15, 32]). 

In practice the implementation of the compressed sensing paradigm requires 
a way to decide how many measurements n should be taken. The preceding 
theoretical bound n > kdlogd is not useful for this because it may involve 
unspecified constants, but also, more importantly, because the rank k of 6 is 
typically not known. Instead one can try to find a data driven stopping rule 
n that guarantees that recovery with precision e occurs after n measurements, 
with high probability. In the quantum tomography context such stopping rules 
are called ‘certificates’ (see Section IV in [10]), as they certify the reconstruction 
of the true quantum state 9. It is not difficult to see, and will be made precise 
below, that the construction of such stopping rules is intimately connected to 
the construction of a (sequential) confidence region for the unknown parameter 
9, and due to its importance in applications this topic has received considerable 
attention recently by physicists, see [8, 3, 33, 2, 10, 31]. None of the previous 
constructions has succeeded, however, in constructing an optimal stopping rule 
for which n ss kdlogd holds with high probability. 

The main contribution of the present paper is to construct optimal non- 
asymptotic Frobenius norm confidence regions for low rank parameters in the 
model (1), and to use them to devise optimal sequential data driven stopping 
rules n (‘certificates’) for the measurement process. That such procedures exist 
may at first look surprising in view of negative results in the ‘sparse’ compressed 
sensing setting in [26], but our results reveal the more favourable information- 
theoretic structure of the matrix model. While our techniques are based on 
unbiased risk estimation ideas that were first used in nonparametric statistics 
(see [24, 30], and also [4]) and that apply in a general setting, we lay out the 
details for a basic (sub-) Gaussian design and noise model, as well as for the 
Pauli observation scheme relevant in quantum tomography (see [10] and Condi¬ 
tion lb) below). We shall also address the more difficult question of constructing 
confidence regions for a quantum state matrix in the stronger nuclear norm. Re¬ 
lationships between our findings and the recent literature on confidence regions 
for high-dimensional statistical parameters are discussed at the end. We also 
investigate the performance of our procedures in basic simulation study. 
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2. The framework of matrix compressed sensing 
2.1. Notation 

Denote by M<j(IK) the space of d x d matrices with entries in IK = C or IK = R. 
We write || • ||^ for the usual Frobenius norm on M<j(K) arising from the inner 
product tr(A T B) = (A,B) F . Moreover let H^K) be the set of all Hermitian 
matrices (equal to the set of all symmetric d x d matrices when IK = R). The 
norm symbol || • || denotes the standard Euclidean norm on C" arising from the 
Euclidean inner product (•,•). 

We denote the usual operator norm on M^K) by || • || op . For M £ Md(IK) let 
: k = 1,..., d) be the eigenvalues of M T M (which are all real-valued and 
positive). The Zi-Schatten, trace , or nuclear norm of M is defined as 

l|M|| Sl =£l A fl- 

j<d 


Note that for any matrix M of rank 1 < r < d, 

\\M\\ f < \\M\\ Sl < Vr\\M\\ F . (2) 

We will consider parameter subspaces of Hd(IK) described by low rank con¬ 
straints on 6 , and denote by R(k ) the space of all Hermitian d x d matrices 
that have rank at most k , k < d. In quantum tomography applications, we may 
assume an additional ‘shape constraint’, namely that 9 is a density matrix of a 
quantum state, and hence contained in state space 

0+ = {9 £ Hd(K) : tr(9) = 1 ,6 > 0}, 

where 9 iz 0 means that 9 is positive semi-definite. In fact, in most situations, 
we will only require the bound H^Hsi < 1 which holds for any 9 in 0 + . 

2.2. Sensing matrices 

We now specify assumptions on the design matrices X 1 used in our observation 
model (1). When 9 has real-valued entries we shall restrict to design matrices 
X 1 with real-valued entries too, and for general 9 £ H^C) we shall assume 
X 1 £ H ( i(C). This way, in either case, the measurements tr{Xi0y s and hence 
the Yy s are all real-valued. Note that in Part a) below the design matrices are 
not Hermitian but our results can easily be generalised to symmetrised sub- 
Gaussian ensembles (as those considered in ref. [21]). Part b) corresponds to 
the quantum tomography measurement model used in [13, 25, 10, 14] - we refer 
to the appendix for a detailed derivation. 

Condition 1 a) 9 £ H^R), ‘isotropic’ sub-Gaussian design: The ran¬ 
dom variables [X l m k ), 1 < m, k < d, i = 1,..., n, generating the entries of 
the random matrix X 1 are i.i.d. distributed across all indices i,m,k with 
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mean zero and unit variance. Moreover, for every 9 £ M<j(K) such that 
||^||_f < 1 the real random variables Zi = tr{X l 9) are sub-Gaussian: for 
some fixed constants t\,T 2 >0 independent of 9, 

Ee xz ' < r ie y ^ VA e K. 


b) 9 £ Hd(C), random sampling from a basis (‘Pauli design’): Let 

{E\, ..., E d z} C H d (C) be a basis of M^C) that is orthonormal for the 
scalar product (•, -) F and such that the operator norms satisfy, for all i = 
l,...,d 2 , 

ll-Eillop ^ ~q, 

for some K > 0. [In the Pauli basis case we have K = 1.] Assume the X 1 , 
i = 1 ,n, are draws from the finite family £ = {dEi : i = 1 ,..., d 2 } 
sampled uniformly at random. 

The above examples all obey the matrix restricted isometry property , that we 
describe now. Note first that if X : R. dxd —> M" is the linear ‘sampling’ operator 

X -.9^X9= ( tr(X 1 9),..., tr{X n 0 )) T , (3) 


so that we can write the model equation (1) as Y = X9 + e, then in the above 
examples we have the ‘expected isometry’ 

E^\\X9\\ 2 = \\9\\ 2 f . 

Indeed, in the isotropic design case we have 

lnm\ 2 = lj2 E (j212 x k^rn,k) = w 6 ^ ( 4 ) 

i=l \ m k / m k 

and in the ‘basis case’ we have, from Parseval’s identity and since the A'*’s are 
sampled uniformly at random from the basis, 

1 d 2 " d2 

-E\\xe\\ 2 = = Ej)\(Ej,e ) F \ 2 = \\9\\ 2 f . (5) 


The restricted isometry property (RIP) requires that this ‘expected isometry’ 
holds, up to constants and with probability >1 — 5, for a given realisation of 
the sampling operator, and for all d x d matrices 9 of rank at most k: 


sup 

OeR(k) 


^ll 2 -w 

m 2 F 


< T n (k), 


( 6 ) 


where T r (k ) are some constants that may depend, among other things, on the 
rank k and the ‘exceptional probability’ S. For the above examples of isotropic 
and Pauli basis design inequality (6) can be shown to hold with 


!(*) = ■ 


,kd ■ logd 


( 7 ) 
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where c = c(5) = 0(1/8 2 ) as <5 —>• 0 is a fixed constant. See refs. [5, 25] for these 
results. 

2.3. Gaussian and Bernoulli errors, and Pauli observables 

We still have to specify the distribution of the errors E; t in the model (1). In the 
quantum tomography setting of Condition lb), if we fix an element £) G 8 for 
the moment, then as detailed in the appendix the observations Y t = dtr(Ei9)+£i 
are themselves an average of repeated samples from a Bernoulli random variable 
Bi taking values {1,-1} with probabilities given by 


1 + \/dtr(EiQ) 
2 



More precisely, 



where 



is the effective error arising from the measurement procedure making use of T 
preparations to estimate each quantum mechanical expectation value. We could 
work with this Bernoulli error model directly, but since the efs are themselves 
sums of independent random variables, an approximate Gaussian error model 
will be appropriate, too. Note further that 


N < 2 Vd, Ee 2 < ^Var (B iA ) < ^ 


( 8 ) 


so the variances Ee 2 = a 2 are bounded by d/T. A natural assumption is then 

Condition 2 The Si,i = 1 ,...,n, are i.i.d. N(0,a 2 ) where a 2 < v for some 
known constant v. 

This unifies the exposition for both designs considered in Condition 1, but we 
note that our proofs are valid in the exact Bernoulli error model as well, see 
Remark 2 below. 

2-4- Minimax estimation under the RIP 

Assuming the matrix RIP from (6) to hold and Gaussian noise e ~ N(0, a 2 I n ), 
one can show that the minimax risk for recovering a Hermitian rank k matrix 


is 


inf sup Eg||0 — 9\\ 2 f ~ a 


e &<ER(k) 


2 dk 
n 


(9) 
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where ~ denotes two-sided inequality up to universal constants. For the upper 
bound one can use the nuclear norm minimisation procedure or matrix Dantzig 
selector from Candes and Plan [5] (see also [25] for the case of Pauli-design), and 
needs n to be large enough so that the matrix RIP holds with r„(fc) < cq where 
Co is a small enough numerical constant. Such an estimator 9 then satisfies, for 
every 9 £ R(k) and those n £ N for which T n (k) < c 0 , 

kd 

\\6 - 9\\ 2 f < D(6)a 2 — , (10) 

n 

with probability > 1 — 26, and with the constant D(6) depending on 6 and also 
on c 0 (suppressed in the notation). Note that the results in [5] use a different 
scaling in sample size in their Theorem 2.4, but eq. (II.7) in that reference 
explains that this is just a matter of renormalisation. The same bound holds for 
the Bernoulli noise model from Subsection 2.3, see [10]. 

3. Main results 

We now turn to the problem of quantifying the uncertainty of estimators 9 that 
satisfy the risk bound (10). In fact the procedures we construct could be used 
for any estimator of 9, but the conclusions are most interesting when used for 
minimax optimal estimators 9. 

3.1. Confidence sets and sequential sampling protocols 

From a statistical point of view the problem at hand is the one of constructing 
a confidence set for 9: a data-driven subset C n of Md(C) that is ‘centred’ at 9, 
that satisfies 

P g {9 G C n ) > 1 - a, 0 < a < 1, 

for a chosen ‘coverage’ or significance level 1 — a, and such that the Frobenius 
norm diameter |(7 „|,f reflects the accuracy of estimation, that is, it satisfies, 
with high probability, 

\Cn\l^\\e-e\\ 2 F . 

In particular such a confidence set provides, through its diameter C n \ p, a data- 
driven estimate of how well the algorithm has recovered the true matrix 9 in 
Frobenius-norm loss, and in this sense provides a quantification of the uncer¬ 
tainty in the estimate. 

In an experimental situation confidence sets ( C n : n £ N) can be used to 
decide sequentially whether more measurements should be taken (to improve 
the recovery rate), or whether a satisfactory performance has been reached. 
Concretely, for given n we check if \C n \F < e, and continue to take further 
measurements if not. Assuming 9 satisfies the minimax optimal risk bound dk/n 
from (10), we expect to need, ignoring constants, 

dk 2 dk 

— < e and hence at least n > 
n C 
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measurements. Note that we also need the RIP to hold with T n (k) from (7) less 
than a small constant Co, which requires the same number of measurements, 
increased by a further poly-log factor of d (and independently of a). The goal 
is then to prove that a sequential procedure based on C n does not require more 
than approximately 

dklogd 


samples (with high probability). This is made precise in the following definition, 
where we recall that R(k) denotes the set of d x d Hermitian matrices of rank 
at most k < d. 

Definition 1 Let e > 0, <5 > 0 be given constants. An algorithm A returning a 
dx d matrix 9 after h £ N measurements in model (1) is called an (e, S) - adap¬ 
tive sampling procedure if, with P g -probability greater than 1 — S, the following 
properties hold for every 9 £ R(k) and every 1 < k < d: 


\\0-0\\ F <e, ( 11 ) 

and, for some positive constants C(5), 7 , the stopping time h satisfies 

faiOogdy (12) 

e z 

Such an algorithm provides recovery at given accuracy level e with h mea¬ 
surements of minimax optimal order of magnitude (up to a poly-log factor), and 
with probability greater than 1 — 6. The sampling algorithm is adaptive since 
it does not require the knowledge of k , and since the number of measurements 
required depends only on k and not on the ‘worst case’ rank d. 

Our first main result is the following theorem, whose proof relies on the 
construction of non-asymptotic confidence sets C n for 9 at any sample size n, 
given in the next subsection. 

Theorem 1 Consider observations in the model (1) under Conditions lb) and 
2, and where 9 £ 0+. Then an adaptive sampling algorithm in the sense of 
Definition 1 exists for any e, 5 > 0. 

The result above holds for isotropic design from Condition la) too, without 
the constraint 9 £ 0 + , see Remark 1 below. For Pauli design the assumption 
9 £ 0 + (instead of just 9 £ Ma(K)) is, however, necessary: Else the example 
of 9 = 0 or 9 = Ei - where Ei is an arbitrary element of the Pauli basis - 
demonstrates that the number of measurements has to be at least of order d 2 : 
otherwise with positive probability Ei is not drawn at a fixed sample size. On 
this event both the measurements and 9 coincide under the laws Po and P^, 
so we cannot have \\9 — 0||f < e AND ||0 — Ei\\p < e simultaneously for every 
e > 0, disproving existence of an adaptive sampling algorithm. In fact, the 
crucial condition for Theorem 1 to work is that the nuclear norms 116* 11, 5 , are 
bounded by an absolute constant (here = 1 ), which is violated by 11 iT-,; 11 .s, = y/d. 
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3.2. Frobenius norm confidence sets based on unbiased risk 
estimation 

3.2.1. An optimal confidence region for n < d 2 

We suppose that we have two samples at hand, the first being used to construct 
an estimator 9 , such as the one from (10). We freeze 9 and the first sample in 
what follows and all probabilistic statements are under the distribution Pg of 
the second sample Y, X of size n £ N, conditional on the value of 9. We define 
the following residual sum of squares (RSS) statistic 

r n = -\\y-X9\\ 2 -a 2 , (13) 

n 

which satisfies E gf n = ||6 — 0|||, in the model (1) under Conditions 1 and 2 
(see the proof of Theorem 2 below). We assume for now that a is known, see 
Subsection 3.2.4 below for a discussion of the necessary modifications in the 
general case. Given a > 0, let be quantile constants such that 

Pr ~ 1 ' ) > = a ( 14 ) 

(these constants converge to the quantiles of a fixed normal distribution as 
n -A 00 ), let z a = log(3/a) and, for z > 0 a fixed constant to be chosen, define 
the confidence set 

Cn= jus H d (C) : \\v - 9 \\ 2 f <2 ff n + z^+ ~ j , (15) 

where 

z 2 = z 2 (a, d, n, a , v) = z a / 3 o 2 max(3||v — 9\\ 2 Fl Azd/n). 

Note that in the ‘quantum shape constraint’ case 9 £ 0+ we can always upper 
bound ||u — 9\\ F < 2 in the definition of z. which gives a confidence set that 
is easier to compute and of only marginally larger overall diameter. In some 
situations, however, the quantity z/yfn is of smaller order than 1/y/n, and the 
more complicated expression above is generally preferable. 

It is not difficult to see (using that x 2 < y + x/y/n implies x 2 < y + l/n) that 
the mean square Frobenius norm diameter of C n is of order 

MC n \l < \\e - 9\\% + ( 16 ) 

II y II 

Whenever d > yfn - so as long as at most n < d 2 measurements have been taken 
the deviation terms are of smaller order than kd/n for any k > 1, and hence 
C n has minimax optimal expected squared diameter whenever the estimator 9 
is minimax optimal as in (10). 

The following result shows that C n is a valid confidence set for arbitrary 
Hermitian d x d matrices (without any rank constraint). Note that the result is 
non-asymptotic - it holds for every n £ N. 


imsart-generic ver. 2011/11/15 file: ellllowrankrevision.tex date: December 22, 2015 





A. Carpentier, J. Eisert, D. Gross and R. Nickl/Uncertainty Quantification for Matrix CSt) 


Theorem 2 Let 9 £ H^(C) be arbitrary and let Pg be the distribution ofY,X 
from model (1) under Condition 2. 

a) Assume the design satisfies Condition la) and let C n be given by (15) with 
z = 0. We then have for every n £ N that 



Pg(9 £ C n ) > 1 


where c is a numerical constant. In the case of standard Gaussian design, c = 
1/24 is admissible. 

b) Assume the design satisfies Condition lb) with constant K > 0, let C n be 
given by (15) with z > 0 and assume also that 6 £ © + and 9 £ 0+ (that is, both 
satisfy the ‘quantum shape constraint’). Then for every n £ N, 


Pg(9 €C n )>l- — - 2e~ c ^ z 


where C{K) = 1 /[(16 + 8/3)A' 2 ]. 


In Part a), if we want to control the coverage probability at level 1 — a, n 
needs to be large enough so that the third deviation term is controlled at level 
a/3. In the Gaussian design case with a = 0.05, n > 100 is sufficient, for smaller 
sample sizes one can use the confidence region from the next subsection. The 
bound in b) is entirely non-asymptotic for suitable choices of z. Also note that 
the quantile constants z,z a ,f a all scale at least as 0( log(l/a)) in the desired 
coverage level a —> 0 . 

As mentioned above, the confidence set from Theorem 2 is optimal whenever 
the desired performance of ||0 — 9\\ 2 F is no better than of order 1 /y/n, corre¬ 
sponding to the important regime n < d 2 for sequential sampling algorithms. 
Refinements for measurement scales n > d 2 are also of interest - we present two 
optimal approaches in the next two subsections for the designs from Condition 
1 . 

3.2.2. Isotropic design and a confidence set based on U-statistics 

Consider isotropic i.i.d design from Condition la), and an estimator 9 based on 
an initial sample of size n (all statements that follow are conditional on that 
sample) . Collect another n samples to perform the uncertainty quantification 
step. Define the 17-statistic 



i<j m : k 


(17) 


whose Eg-expectation, conditional on 9 , equals ||0 — 6\(p in view of 



m' ,k' 
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Define 

Cn = € H £ j(K) : ||u — 6\\'p < R n + z a ^ n | (18) 

where 

_ c^e-ewp , c 2 d 

Zat.n — /— i 

yjn n 

and C\ > Ci||^||f, C 2 > witla Q constants depending on a and the upper 

bound v for a from Condition 2. Note that if 9 £ 0 + then ||0 ||f < 1 can be used 
as an upper bound in Ci,i = 1,2. In practice the constants Q can be calibrated 
by Monte Carlo simulations (see the implementation section below), or chosen 
based on concentration inequalities for /7-statistics (see ref. [12], Theorem 4.4.8). 
This confidence set has expected diameter 

Ve\C n \ 2 F <\\6-0\\ 2 F + Cl + C2d , 

n 

and hence is compatible with any minimax recovery rate ||0 — 0\\ F < kd/n from 
(10), where k > 1 is now arbitrary. For suitable choices of Q, we now show that 
C n also has non-asymptotic coverage. 

Theorem 3 Assume Conditions la) and 2, and let C n be as in (18). For every 
a > 0 we can choose £i(a) = 0(^/l/a), i = 1 , 2, large enough so that for every 
n £ N we have 

Pg(0 £ C n ) > 1 — a. 


3.2.3. Pauli design: Re-averaging basis elements when n > d 2 


For the design from Condition lb) where we sample uniformly at random from 
a (scaled) basis {dE \,..., dE d 2 } of M<j(C), the //-statistic approach from Theo¬ 
rem 3 appears not to be viable, and thus for d < i Jn the existence of an optimal 
confidence region still needs to be ensured. When d < sjn we are taking n> d 2 
measurements, and there is no need to sample at random from the basis as we 
can measure each individual coefficient, possibly even multiple times. Repeat¬ 
edly sampling a basis coefficient tr(Ei-O) leads to a reduction of the variance of 
the measurement by averaging. More precisely, when taking n = md 2 measure¬ 
ments for some (for simplicity integer) m > 1 , and if (Yp i : l = 1 ,... ,m) are 
the measurements Y, corresponding to the basis element E/-, k £ { 1 ,..., e? 2 }, we 
can form averaged measurements 





1 = 1 


Vrnd{E k ,6) F + e k , 


Cfc 



E £l ~ ^(Ow 2 )- 

i =l 


We can then define the new measurement vector Z = (Z i,..., Z d 2 ) T (using also 
m = n/d 2 ) 

Z k = Z k - y/n(9, E k ) = \fn{E kl 9 - 0) F + e k , k = l,...,d 2 
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and the statistic 


R n = -\\z\\l d ->- 

n k n 


(19) 


which estimates ||0 — 6\\ 2 F with precision 

d 2 d 2 

Rn - \\e - ef F = V e k (E k , 9 - 0) F + - V(4 - Ee 2 

» rrt L ' n L ^ 


k=1 


k=1 


= 0 ,ht 4 ; + " V 

y sjn n J 

Hence, for 2 a the quantiles of a 1V(0,1) distribution and ^ QjCr as in (14) with d 2 
replacing n there, we can define a confidence set 


C n ={vG m d (C) : \\v-e\\ 2 F <R n + 


z a /2Vp - 0 \\f Ca/2 ,<jd 


( 20 ) 


which has non-asymptotic coverage 

P e{0&C n ) >l-o 

for every n £ N, by similar (in fact, since Lemma 1 is not needed, simpler) 
arguments as in the proof of Theorem 2 below. The expected diameter of C n is 
by construction 

Ke\C n \ 2 F <\\0-O\\ 2 F + —, ( 21 ) 

n 

now compatible with any rate of recovery kd/n,l < k < d. The case of unknown 
variance is discussed in the next subsection. 


3.2.4■ Unknown variance 

The [/-statistic based confidence set from (18) does not require knowledge of 
er but works only for the design from Condition la). For Pauli design from 
Condition lb) we can use the confidence sets C n in Theorem 2 or C n in (20), but 
these do require exact knowledge of the noise variance a 2 . As described before 
( 8 ) above, in the Pauli case a 2 can be apriori bounded by d/T, where T is the 
number of preparations used to measure each individual Pauli observable. If T > 
n then the statistics r n and R n from (13) and (19) above can be used without 
subtracting a 2 and <j 2 d 2 /n, respectively, in their definitions. The coverage proofs 
then go through with minor modifications simply by noting that these centerings 
are of sufficiently small order of magnitude a 2 < d/T < d/n and o 2 d 2 /n < 
d 3 /Tn < d/n compared to the minimax rate of estimation, and by using the 
upper bound er 2 < v = d/T in all relevant constants featuring in the definition 
of C n ,C n . 

Typically preparing T > n measurements of a fixed Pauli observable is not 
a major problem in experimental situations. If for some reason this cannot be 
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done, one can make sure that each tr(E.id) is at least measured twice (so T > 
2), say in batches Yi,..., Y n / 2 and Y n / 2 + i,..., Y n , and then use the modified 
statistic 



in the construction of the confidence set. Arguments similar to above, using 
concentration inequalities for Gaussian chaos of order two (Theorem 3.1.9 in 
[ 12 ]) then allow for the construction of a confidence region that does neither 
require knowledge of a 2 < v nor T > n. Details are omitted. 

3.3. A confidence set in trace norm under quantum shape 
constraints 

The confidence sets from the previous subsections are all valid in the sense that 
they contain information about the recovery of 9 by 9 in Frobenius norm || ■ ||^. 
It is of interest to obtain results in stronger norms, such as for instance the 
nuclear norm || • || ^, which is particularly meaningful for quantum tomogra¬ 
phy problems since it then corresponds to the total variation distance on the 
set of ‘probability density matrices’. The absence of the ‘Hilbert space geom¬ 
etry’ induced by the relationship of the Frobenius norm to the inner product 
makes this problem significantly harder, both technically and from an 
information-theoretic point of view. In particular the quantum shape constraint 
9 £ © + is crucial to obtain any results whatsoever. For the theoretical results 
presented here it will be more convenient to perform an asymptotic analysis 
where min(n, d) —> oo (with o, O- notation to be understood accordingly). 

Instead of Condition 1 we now consider more generally any design (JC,« = 
1,..., n) in model (1) that satisfies the matrix RIP ( 6 ) with 



( 22 ) 


We shall still use the convention discussed before Condition 1 that 6 and the 
matrices X 1 are such that tr(X l 9) is always real-valued. 

In contrast to the results from the previous section we shall now assume a 
minimal low rank constraint on the parameter space: 

Condition 3 0 £ R + (k) := R{k) D 0+ for some k satisfying 



This in particular implies that the RIP holds with r n (k) = o(l). Given this 
minimal rank constraint 9 £ R + {k ), we now show that it is possible to construct 
a confidence set C n that adapts to any low rank 1 < ko < k. Here we may choose 
k = d but note that this forces n>d 2 (for Condition 3 to hold with k = d). 
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We assume that there exists an estimator 0pii ot that satisfies, uniformly in 
R(k 0 ) for any ko < k and for n large enough, 


II? 


Pilot 


\l < Da 2 


k Q d 


i(*o) 


(23) 


where D = D(S) depends on <5, and where so-defined r n will be used frequently 
below. Such estimators exist as has already been discussed before (10). We shall 
in fact require a little more, namely the following oracle inequality: for any k 
and any matrix S of rank k < d, with high probability and for n large enough, 


||0piiot-0||F<||0-^||F + r„(fc), (24) 


which implies (23). Such inequalities exist assuming the RIP and Condition 3, 
see, e.g., Theorem 2.8 in ref. [5]. Starting from 0pii o t one can construct (see 
Theorem 5 below) an estimator that recovers 9 £ R(k) in nuclear norm at 
rate ky/d/n , which is again optimal from a minimax point of view, even under 
the quantum constraint (as discussed, e.g., in ref. [21]). We now construct an 
adaptive confidence set for 9 centred at a suitable projection of f?pii 0 t onto 0+. 

In the proof of Theorem 4 below we will construct estimated eigenvalues 
(A j,j = 1,... ,d) off? (see after Lemma 3). Given those eigenvalues and f?piiot, 
we choose k to equal the smallest integer < d such that there exists a rank k 
matrix O' for which 


\\0' - 0piiot||F < r n (k) and 1 - ^ A j < 2k^/djn 

J<k 


is satisfied. Such k exists with high probability (since the inequalities are satisfied 
for the true 9 and A,’s, as our proofs imply). Define next d to be the (-,-)f- 
projection of f?pii ot onto 

R + {2k) :=R{2k)nQ+ 


and note that, since 2k > fc, 

ll^Piiot - d \\f = H^Piiot — R + ( 2 A:)||f < H^Piiot — ^ ||f < r n (k). 
Finally define, for C a constant chosen below, 


= {u £ 0+ : ||v - ^|| Sl < cVlr n (k)} . 


(25) 


(26) 


Theorem 4 Assume Condition 3 for some 1 < k < d, and let 5 > 0 be given. 
Assume that with probability greater than 1 — 2(5/3, a) the RIP (6) holds with 
T n (k) as in (22) and b) there exists an estimator f?pn 0 t for which (2f) holds. Then 
we can choose C = C(S) large enough so that, for C n as in the last display, 

liminf inf P g(9 £ C n ) > 1 — <5. 

mm(n,d)— >-oo 0£.R+(fc) 
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Moreover, uniformly in R + (ko), 1 < ko < k, and with P q- probability greater than 

1-5, 

|C„|si < \fkor n (k 0 ). 

Theorem 4 shows how the quantum shape constraint allows for the construc¬ 
tion of an optimal nuclear norm confidence set that adapts to the unknown low 
rank structure. A careful study of certain hypothesis testing problems (combined 
with lower bound techniques for confidence sets as in [17, 26]) shows that the 
assumption 9 £ R + (k) in the above theorem is actually necessary, and cannot 
be relaxed to 9 £ R(k). See [7], Theorem 4. 

3.4- Conclusions 

We have constructed adaptive confidence regions for matrix parameters 9 in the 
trace regression model (1). These confidence regions contract at the minimax 
optimal rates for low rank parameters, either in Frobenius or nuclear norm, and 
are ‘honest’ (in the sense of [24], see also [30, 17]). The conditions employed are 
naturally compatible with quantum tomography applications - where 6 is the 
density matrix of a quantum state, and where the noise variance has an a priori 
upper bound that can be controlled experimentally. This in turn can be used to 
demonstrate the existence of fully adaptive sequential sampling protocols that 
generate valid certificates for the recovery of unknown low rank quantum states. 

While it can be shown on the one hand (see Theorem 4 in [7]) that our 
results for the nuclear norm (Theorem 4) fundamentally rely on the ‘quantum 
shape constraint’ 9 £ 0 + , our results for the Frobenius norm on the other 
hand are valid in a general compressed sensing inference setting. This may seem 
surprising in light of negative results in [26] , where it is shown that in the related 
‘sparse’ high-dimensional linear model, signal strength assumptions (inspired by 
the nonparametric statistics literature, [11, 17]) are generally necessary for the 
existence of f^-confidence regions for the entire parameter vector. However, the 
information theoretic structure of the matrix inference problem is different, as 
is also illustrated by the fact that the signal detection rates in the model ( 1 ) in 
Frobenius norm do not depend on the low rank structure at all (see Theorem 
1 in [7]). In this sense, our findings in the matrix regression model form a 
remarkable exception to the rule that uncertainty quantification methodology 
does not generally exist for high-dimensional adaptive algorithms, unless one 
restricts the inferential interest to a simple semi-parametric low-dimensional 
functional ([34, 35, 20, 6 ]). 

4. Simulation experiments 

In order to illustrate the methods from this paper, we present some numerical 
simulations. The setting of the experiments is as follows: A random matrix 77 £ 
M d (C) of norm || 77 H^ = i ? 1 / 2 is generated according to two distinct procedures 
that we will specify later, and the observations are now 

Yi = tr(A» +£i. 
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where the £j are i.i.d. Gaussian of mean 0 and variance 1. The observations 
are reparametrised so that if represents the ‘estimation error’ 0 — 0, and we 
investigate how well the statistics 


= -||Y|| - 1 and R n = " 

n nyn — 1) 




i<j m,k 


estimate the ‘accuracy of estimation’ ||77||^ = ||0 — 0\\p, conditional on the 
value of 0. We will choose rj in order to illustrate two extreme cases: a first 
one where the nuclear norm is ‘small’, corresponding to a situation where 

the quantum constraint is fulfilled; and a second one where the nuclear norm is 
large, corresponding to a situation where the quantum constraint is not fulfilled. 
More precisely we generate the parameter t) in two ways: 

• ‘Random Dirac’ case: set a single entry (with position chosen at random 
on the diagonal) of rj to R 1 ^ 2 , and all the other coordinates equal to 0. 

• ‘Random Pauli’ case: Set rj equal to a Pauli basis element chosen uniformly 
at random and then multiplied by f? 1 / 2 . 

The designs that we consider are the Gaussian design, and the Pauli design, 
described in Condition 1. We perform experiments with d = 32, R £ {0.1,1} 
and n £ {100, 200,500,1000, 2000, 5000}. Note that d 2 = 1024, so that the first 
four choices of n correspond to the important regime n < d 2 . Our results are 
plotted as a function of the number n of samples in Figures 1, 2, 3, 4. The solid 
red an blue curves are the median errors of the normalised estimation errors 

\/ Rn R i \/ r'n R 

f?l/2 ’ aild Rl/2 ’ 

after 1000 iterations, and the dotted lines are respectively, the (two-sided) 90% 
quantiles. We also report (see Tables 1, 2, 3, 4) how well the confidence sets based 
on these estimates of the norm perform in terms of coverage probabilities, and 
of diameters. The diameters are computed as 

C £ 4/2 \ 1/2 

^UStat-TC" | 

) ’ 

C fVaX 1 / 2 
°RSS'" I 

Vn ) ' 


CuStatrf 
n 

for the U-Statistic approach and 




for the RSS approach, where we have chosen Custat = 2.5, Crss = 1 and 
^ustat = Crss = 6 for all experiments -calibrated to a 95% coverage level. 
From these numerical results, several observations can be made: 

1) In Gaussian random designs, the results are insensitive to the nature of 
rj (see Figures 1 and 2 and Tables 1 and 2). This is not surprising since the 
Gaussian design is ‘isotropic’. 
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Gaussian design, eta = Dirac, R A 2 = 0.1 



0 1000 2000 3000 4000 5000 


n 


Gaussian design, eta = Dirac, R A 2 = 1 

cc 

c\T o 


^ ° 

CC o 

C\J O 
c T 7 
CC 1 

0 1000 2000 3000 4000 5000 

n 



Fig 1. Gaussian design, and random Dirac (a single entry, chosen at random, is non-zero 
on the diagonal) rj, with R = 0.1 (left picture) and R = 1 (right picture). 


2) For Pauli designs with the quantum constraint (see Figure 3 and Table 3) 
the RSS method works quite well even for small sample sizes. But the U-Stat 
method is not very reliable - indeed we see no empirical evidence that Theorem 
3 should also hold true for Pauli design. 

3) For Pauli design and when the quantum shape constraint is not satisfied 
our methods cease to provide reliable results (see Figure 4 and in particular 
Table 4). Indeed, when the matrix 77 is chosen itself as a random Pauli (which 
is the hardest signal to detect under Pauli design) both the RSS and the U-Stat 
approach perform poorly. The confidence set are not honest anymore, which is in 
line with the theoretical limitations we observe in Theorem 2. Figure 4 illustrates 
that the methods do not detect the signal, since the norm of 77 is largely under¬ 
evaluated for small sample sizes. These limitations are less pronounced when 
n > dr. In this case one could use alternatively the re-averaging approach from 
Subsection 3.2.3 (not investigated in the simulations) to obtain honest results 
without the quantum shape constraint. 
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CM 

< 

cc 

CM 

c 

CC 


Gaussian design, eta = Pauli, R A 2 = 0.1 

CO 


CO 

0 1000 2000 3000 4000 5000 



n 


Gaussian design, eta = Pauli, R A 2 = 1 


— U-St; 

— RSS 


1000 


2000 3000 

n 


4000 


5000 


Fig 2. Gaussian design, and random Pauli p, with R = 0.1 (left picture) and R = 1 (right 
picture). 


Pauli design, eta = Dirac, R A 2 = 0.1 

cc 


— U-Sti 
- Rfifi 



0 1000 2000 3000 4000 5000 


n 


Pauli design, eta = Dirac, R A 2 = 1 



— u-st 

— RSS 



0 1000 2000 3000 4000 5000 


n 


Fig 3. Pauli design, and random Dirac (a single entry, chosen at random, is non-zero on the 
diagonal) rj, with R = 0.1 (left picture) and R = 1 (right picture). 


imsart-generic ver. 2011/11/15 file: ellllowrankrevision.tex date: December 22, 2015 
































A. Carpentier, J. Eisert, D. Gross and R. Nickl/Uncertainty Quantification for Matrix C\% 



R = 0.1 

R = 1 

n 

100 

200 

500 

1000 

2000 

5000 

100 

200 

500 

1000 

2000 

5000 

Coverage U-Stat 

0.97 

0.98 

0.99 

1.00 

1.00 

1.00 

0.93 

0.96 

0.97 

0.98 

0.98 

0.98 

Diameter U-Stat 

1.10 

0.64 

0.34 

0.24 

0.18 

0.14 

2.43 

1.84 

1.44 

1.27 

1.17 

1.10 

Coverage RSS 

0.97 

0.97 

0.98 

0.98 

0.98 

0.98 

0.99 

0.99 

0.99 

0.99 

0.99 

0.99 

Diameter RSS 

0.38 

0.31 

0.23 

0.19 

0.16 

0.14 

1.69 

1.49 

1.32 

1.22 

1.16 

1.10 


Table 1 

Gaussian design, and random Dirac (a single entry, chosen at random, is non-zero on the 
diagonal) 77 , with R = 0.1 (left table) and R = 1 (right table). 



II 

o 

R= 1 

n 

100 

200 

500 

1000 

2000 

5000 

100 

200 

500 

1000 

2000 

5000 

Coverage U-Stat 

0.98 

0.98 

0.99 

0.99 

1.0 

1.0 

0.93 

0.95 

0.97 

0.98 

0.98 

0.98 

Diameter U-Stat 

1.10 

0.62 

0.34 

0.24 

0.18 

0.14 

2.40 

1.83 

1.43 

1.27 

1.18 

1.10 

Coverage RSS 

0.98 

0.98 

0.97 

0.97 

0.97 

0.97 

0.99 

0.99 

0.99 

0.99 

1.00 

1.00 

Diameter RSS 

0.39 

0.31 

0.23 

0.19 

0.17 

0.14 

1.71 

1.49 

1.31 

1.22 

1.16 

1.10 


Table 2 

Gaussian design, and random Pauli ry, with R = 0.1 (left table) and R = 1 (right table). 


5. Proofs 

5.1. Proof of Theorem 1 

Before we define the algorithm and prove the result, a few preparatory remarks 
are required: Our sequential procedure will be implemented in m = 1,2,..., T 
potential steps, in each of which 2 ■ 2 m = 2 m+1 measurements are taken. The 
arguments below will show that we can restrict the search to at most 

T = 0(log(d/e)) 

steps. We also note that from the discussion after (6) - in particular since 
c = c(5) from (7) is 0(1/S 2 ) - a simple union bound over m < T implies 
that the RIP holds with probability >1 — 5', some S' > 0, simultaneously for 
every m < T satisfying 2 m > c'kdlogd, and with T 2 ^(k) < cq, where d is a 
constant that depends on S', Cq only. The maximum over T = 0(\og(d/e)) terms 
is absorbed in a slightly enlarged poly-log term. Hence, simultaneously for all 
such sample sizes 2 m ,m < T, a nuclear norm regulariser exists that achieves 
the optimal rate from (10) with n = 2 m and for every k < d, with probability 
greater than 1 — <5/3. Projecting this estimator onto 0+ changes the Frobenius 
error only by a universal multiplicative constant (arguing as in (25) below), and 
we denote by 02 m € 0+ the resulting estimator computed from a sample of size 
2 m . 

We now describe the algorithm at the m-th step: Split the 2 m+1 observations 
into two halves and use the first subsample to construct € 0+ satisfying (10) 
with Pg-probability > 1 — <5/3. Then use the other 2 m observations to construct 
a confidence set C 2 ™ for 0 centred at 62 m: if 2 m < d 2 we take from (15) and 
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R = 0.1 

R = 1 

n 

100 

200 

500 

1000 

2000 

5000 

100 

200 

500 

1000 

2000 

5000 

Coverage U-Stat 

0.97 

0.98 

0.98 

0.99 

0.98 

0.98 

0.85 

0.54 

0.69 

0.69 

0.70 

0.71 

Diameter U-Stat 

1.10 

0.63 

0.34 

0.24 

0.18 

0.14 

2.28 

1.87 

1.43 

1.26 

1.18 

1.10 

Coverage RSS 

0.96 

0.96 

0.96 

0.96 

0.97 

0.97 

0.88 

0.89 

0.88 

0.88 

0.88 

0.88 

Diameter RSS 

0.39 

0.29 

0.23 

0.19 

0.16 

0.14 

1.70 

1.50 

1.30 

1.21 

1.16 

1.10 


Table 3 

Pauli design, and random Dirac (a single entry, chosen at random, is non-zero on the 
diagonal) q, with R = 0.1 (left table) and R = 1 (right table). 


Pauli design, eta = Pauli, R A 2 = 0.1 


cc 



0 1000 2000 3000 4000 5000 


n 

Pauli design, eta = Pauli, R A 2 = 1 


I X 



0 1000 2000 3000 4000 5000 


n 

Fig 4. Pauli design, and random Pauli r\, with R = 0.1 (left picture) and R = 1 (right 
picture). 


if 2 m > d 2 we take C 2 m from (20) - in both cases of non-asymptotic coverage at 
least 1 — a, a = 6/(3T ) [If a is unknown we proceed as described in Subsection 
3.2.4]. If \C 2 m\F < e we terminate the procedure (m = m, h = 2 m+1 , 9 = 9 2 ™), 
but if |C f 2 m \f > e we repeat the above procedure with 2 • 2 m+1 = 2 m+1+1 new 
measurements, etc., until the algorithm terminates, in which case we have used 

J2 2 m+1 < 2™ « n 

m<m 


measurements in total. 

To analyse this algorithm, recall that the quantile constants z, z a , appear¬ 
ing in the confidence sets (15) and (20) for our choice of a = S/(3T) grow at 
most as 0(log(l/a)) = O(logT) = o(logd). In particular in view of (10) and 
(16) or (21) the algorithm necessarily stops at a ‘maximal sample size’ n = 2 T+1 
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R = 0.1 

R = 1 

n 

100 

200 

500 

1000 

2000 

5000 

100 

200 

500 

1000 

2000 

5000 

Coverage U-Stat 

0.97 

0.97 

0.96 

0.86 

0.65 

0.58 

0.82 

0.22 

0.25 

0.27 

0.30 

0.37 

Diameter U-Stat 

1.09 

0.57 

0.34 

0.25 

0.18 

0.15 

2.45 

2.09 

1.33 

1.38 

1.19 

1.09 

Coverage RSS 

0.93 

0.86 

0.77 

0.77 

0.77 

0.77 

0.12 

0.19 

0.40 

0.63 

0.56 

0.53 

Diameter RSS 

0.38 

0.29 

0.22 

0.19 

0.16 

0.14 

1.71 

1.56 

1.31 

1.26 

1.14 

1.08 


Table 4 

Pauli design, and random Pauli rj, with R = 0.1 (left table) and R = 1 (right table). 


in which the squared Frobenius risk of the maximal model (k = d) is controlled 
at level e. Such T £ N is 0(log(d/e)) and depends on a,d,e,S, hence can be 
chosen by the experimenter. 

To prove that this algorithms works we show that the event 

{||» — e\\ 2 F > e 2 } U } = A, U A, 

has probability at most 2<5/3 for large enough C(S), 7 . By the union bound it 
suffices to bound the probability of each event separately by <5/3. For the first: 
Since n has been selected we know \Cn\p < e and since 9 = 9 „ the event Ai can 
only happen when 9 ^ C Therefore 


T 

Pe(Ai) < P e(9 p e(9 $ C 2m ) < 5 

m =1 


T 
3 T 


6 

3' 


For A 2 , whenever 9 £ R(k ) and for all to < T for which 2 m > c'kdlogd, we have, 
as discussed above, from (16) or ( 21 ) and ( 10 ) that 


E e |C 2 


I F ' 


< D 


, kd log T 


where D' is a constant. In the last inequality the expectation is taken under 
the distribution of the sample used for the construction of C 2 m , and it holds on 
the event on which 9 2 ™ realises the risk bound (10). Then let C(S), 7 be large 
enough so that C{5)kd(\ogd) 1 /e 2 > dkdXogd and let to 0 £ N be the smallest 
integer such that 

2mo > c(6)kd(\o S dr ' 

e 2 

Then, for C(S) large enough and since T = 0(log(d/e), 


p, 


< Pe (\C 2 ™ 0 \ 2 F > e 2 ) < 


MC2 


'■ell 


< 


D' log T 

C{S)(\ogdV 


<<5/3, 


by Markov’s inequality, completing the proof. 

Remark 1 (Isotropic sampling) The proof above works for isotropic design 
from Condition la) likewise. When 2 m > d 2 we replace the confidence set (20) in 
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the above proof by the confidence set from (18). Assuming also that ||0 ||f < M 
for some fixed constant M we can construct a similar upper bound for T and 
the above proof applies directly (with T of slighter larger but still small enough 
order). 

5.2. Proof of Theorem 2 

By Lemma 1 below with 9 = 6 — 9 the Pg-probability of the complement of the 
event 



is bounded by the deviation terms 2e cn and 2e C ( K ) Z ; respectively (note 2 = 0 
in Case a)). We restrict to this event in what follows. We can decompose 


1 


9 1 

|| X(9 - 9)|| 2 + -(e, X{9 -9)) + - V(e? - Es 2 ) = A + B + C. 
n n z —' 



Since P(l r + Z < 0) < P (Y < 0) + P [Z < 0) for any random variables Y. Z we 
can bound the probability 



1 

2 


\\e-~9\\l>A + B + c + -+ ~ z + ^ a 

n Jn 



by the sum of the following probabilities 




III := Pg 



The first probability I is bounded by 




n 


1 


\\X{9 — 9)\\ 2 — ||0 — 9\\ 2 f > max 



£ =0 


About term II: Conditional on X the variable f^(e,X(9 — 9)) is centred Gaus¬ 
sian with variance (o 2 /n)\\X(9 — 9 )|| 2 . The standard Gaussian tail bound then 
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gives by definition of z, and conditional on X , 


<exp{-^/2( f 7 2 /n)||A’(0-0)|| 2 } 

z a /3 max(3||fl - 0\\%Azd/n) 
2\\X(0-O)\\*/n 

since, on the event £, 



< exp{— 2 a/3 } = a/3 


max(3||0 — 0|| p, 4zd/n) > (2 / n)\\X {6 — 6)\\ 2 . 


The overall bound for II follows from integrating the last but one inequality 
over the distribution of X. Term III is bounded by a/3 by definition of £ a>cr . 

Remark 2 (Modification of the proof for Bernoulli errors) If instead of 
Gaussian errors we work with the error model from Subsection 2.3, we require a 
modified treatment of the terms II, III in the above proof. For the pure noise 
term III we modify the quantile constants slightly to £ ajCr = sj (1/a). If the 
number T of preparations satisfies T > 4d 2 then Chebyshev’s inequality and (8) 
give 



For the ‘cross term’ we have likewise with z a = yjl/ot and a* = {X{9 — Q))i 
that, on the event £, 




just as at the end of the proof of Theorem 2, so that coverage follows from 
integrating the last inequality w.r.t. the distribution of X. The scaling T ss d 2 
is similar to the one discussed in Theorem 3 in ref. [10]. 

Lemma 1 a) For isotropic design from Condition la) and any fixed matrix 
D £ Hd(C) we have, for every n £ N, 



In the standard Gaussian design case we can take c = 1/24. 

b) In the ‘Pauli basis’ case from Condition lb) we have for any fixed matr 
d £ Hd(C) satisfying the Schatten-l-norm bound ||'*9|| Si < 2 and every n £ N, 



where C(K) = 1 /[(16 + 8/3) K 2 ], and where K is the coherence constant of the 
basis. 
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Proof. We first prove the isotropic case. From (4) we see 


Pr 


hm 2 

n 


IW 


> 


/2 = Pr 


E( z '- Ez i 2 


i=1 



where the Zi/||i?||f are sub-Gaussian random variables. Then the Z?/||i?[||, are 
sub-exponential and we can apply Bernstein’s inequality (Prop. 4.1.8 in ref. [12]) 
to the last probability. We give the details for the Gaussian case and derive 
explicit constants. In this case gi := ■Zi/||'i?||F ~ N( 0,1) so the last probability 
is bounded, using Theorem 4.1.9 in ref. [12], by 


Pr 


E ^ 2 -!) 


*=l 



< 2 exp 


n 2 /4 1 
4 n + 2n J ’ 


and the result follows. 

Under Condition lb), if we write D = max(n||$|||./2, zd) we can reduce 
likewise to bound the probability in question by 


Pr 


E( y * - Ey i) 

i= 1 



where the Y) = \tr(X l d)\ 2 are i.i.d. bounded random variables. Using ||J5j|| op < 
K/y/d from Condition lb) and the quantum constraint ||i?||f < ||'i9||s , 1 < 2 we 
can bound 

l*i| < d 2 max ||£'i||opl|'^||s 1 < 4A" 2 d := U 

as well as 

E Y 2 < UE\Yi\ < 4K 2 d\\d\\% := s 2 . 

Bernstein’s inequality for bounded variables (e.g., Theorem 4.1.7 in ref. [12]) 
applies to give the bound 

2 ex pfw^f|^}<2exp { -C(K W , 

after some basic computations, by distinguishing the two regimes of D = n||i?||p./2 > 
zd and D = zd> n||i9|||i/2. ■ 


5.3. Proof of Theorem, 3 

Since E s R n = ||0 — 9\\ 2 F we have from Chebyshev’s inequality 

P e{0 i C n ) < P e (\R n - EA„| > z a> fj 
Var g (R n - ER n ) 
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Now U n = R n — E g R n is a centred U-statistic and has Hoeffding decomposition 
U n = 2L n + D n where 

1 n 

Ln = - E - ®e\YiX' m M & r n ,k ~ 9m,*) 

i—1 m,k 


is the linear part and 
2 

D n = 


n(n — 1) 


E E^U - E e[YiX' mtk ])(Y j x ' mtk - E[YjX l m j,}) 


i<j m,k 


the degenerate part. We note that L n and D n are orthogonal in L 2 
The linear part can be decomposed into L n = Ln 1 + Ln' 1 where 


L n ] = “EE E X L',k'X' m ,k&m>,k' 

2 — 1 m,k \m',k f 


®m,k I {®m,k €)?n,fe) 


and 

1 n 

L{ n = - E-E X kfc( 0 ™.fe - ^w)- 

i—1 m,k 

Now by the i.i.d. assumption we have 


Var e (L^) = a 2 


\\e-o\\ 2 f 

n 


Moreover, by transposing the indices to, k and m', k' in an arbitrary way into 
single indices M = 1,..., d 2 , K = 1 ,,d 2 , d 2 = p, respectively, basic compu¬ 
tations given before eq. (28) in ref. [26] imply that the variance of the second 
term is bounded by 


Var,(LW) < 


c\\o-d\\ 2 F \m 2 F 

n 


where c is a constant that depends only on KXf - t (which is finite since the Xu 
are sub-Gaussian in view of Condition la)). Moreover, the degenerate term 
satisfies 


Var e (D n ) < c-^||0||^ 

in view of standard U-statistic computations leading to eq. (6.6) in ref. [19], 
with d 2 = p , and using the same transposition of indices as before. This proves 
coverage by choosing the constants in the definition of z ajU large enough. 


5.4- Proof of Theorem 4 

We prove the result for symmetric matrices with real entries - the case of Her- 
mitian matrices requires only minor (mostly notational) adaptations. 

Given the estimator dpiiot> we can easily transform it into another estimator 
9 for which the following is true. 
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Theorem 5 There exists an estimator 9 that satisfies, uniformly in 9 £ R(k), 
for any k < d and with P g-probability greater than 1 — 25/3, 

\\e-e\\F<r n (k), 


as well as, 

9 £ R{k), 

and then also 


lie-fllta < V2kr n (k). 

Proof. Let 0pii o t and let 9 be the element of R(d) with smallest rank k! such 
that 

W*ot-~e\\ 2 F < r ^. 

Such 9 exists and has rank < k, with probability > 1 — 25/3, since 9 £ R(k) 
satisfies the above inequality in view of (23). The || • |||.-loss of 9 is no larger 
than r n (k) by the triangle inequality 

||0 — #||f < ||# - 5*Pilot || F + ll^Pilot - 9\\F, 

and this completes the proof of the third claim in view of (2). ■ 

The rest of the proof consists of three steps: The first establishes some aux¬ 
iliary empirical process type results, which are then used in the second step to 
construct a sufficiently good simultaneous estimate of the eigenvalues of 9. In 
Step III the coverage of the confidence set is established. 

STEP I 

Let 9 £ R + (k) = R(k ) n 0+ and let 9 be the estimator from Theorem 5. 
Then with probability > 1 — 25/3, and if rj = 9 — 9, we have 

IMI F < r 2 n(k) VO £ R+(k), (27) 

and that r/ £ R(2k). For the rest of the proof we restrict in what follows to the 
event of probability greater than or equal to 1 — 25/3 described by a) and b) in 
the hypothesis of the theorem. 

Write Y! = 1/ — tr(X l 9 ) for the ‘new observations’ 

Y[ = tr(X % rj) +£i, i = 1,... ,n. 

For any d x d! matrix V we set 


%{V) = V T 



V 


which estimates 

7v(V) = V T vV. 
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Let now U be any unit vector in R d . Then in the above notation {d! = 1) we 
can write 

1 ” 

%( C/ )= n E E U m U ml X^ m ,Y' 

i=l 
I n 

= E E u mUm'X \ 1 ,{tr{X l ri) + Ei) 

1=1 m,m'<.d 

1 n ( 

= ~ y ] ^ ' UmUm'X m m , I 'y ' X k k ,r)k t k' X Si 

i =1 m,m' <d \k,k'<d 

If U denotes the d x d matrix UU T , the last quantity can be written as 

-(XV,Xr}) + -(X U,e). 

n n 

We can hence bound, for S = {U £ R d : ||t/|| 2 = 1} 



sup \%(U) 

7]eR(2k),\\ri\\ F <r n {k),UeS 


< sup 

-(XU,X V )-(U, V ) 

+ sup 

~(X\J,e) 

r,&R(2k),\\ V \\ F <r n (k),UGS 

n 

UGS 

n 


Lemma 2 The right hand side on the last inequality is, with probability greater 
than 1 — S, of order 


v n ■■= O 


^ n (/e)r n (fc) 



Proof. The first term in the bound corresponds to the first supremum on the 
right hand side of the last inequality, and follows directly from the matrix RIP 
(and Lemma 4). For the second term we argue conditionally on the values of X 
and on the event for which the matrix RIP is satisfied. We bound the supremum 
of the Gaussian process 

G e (J7) := -E(*u,e> ~ N(0,\\XV\\ 2 /n) 

Jn 


indexed by elements U of the unit sphere S of R d , which satisfies the metric 
entropy bound 

log N{8,S, || • ||) < dlog(A/S) 

by a standard covering argument. Moreover U = UU T £ R( 1) and hence for any 
pair of vectors U, U £ S we have that U — U £ R( 2). From the RIP we deduce 
for every fixed U, U £ S that 


—1| A'U — TU|| 2 = ||U-U||| 1 + 


±\\X(U -U)|| 2 -||U-U|| 


IIU-UIII 

< (l + r n (2))||U-U|||< C\\U-U\\ 
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since t„( 2 ) = 0 ( 1 ) and since 

||U - U||| = Y, {UrnUrn' - UmUm'Y 

m,m' 

= Y ( UmUm> - UmUm> + U m U m ' - U m U m 'f < 2\\U - Uf. 

m,m' 


Hence any 5-covering of S in || • || induces a 8/C covering of S in the intrinsic 
covariance d& e of the (conditional on X) Gaussian process <G e , i.e., 

logN(6,S,d Ge )<dlog(A'/6) 

with constants independent of X. By Dudley’s metric entropy bound (e.g., ref. 
[12]) applied to the conditional Gaussian process we have for D > 0 some 
constant 

r D 

E sup |G e (f7)| < / ^/\ogN{S,S,d Ge )d5<Vd 
ues Jo 

and hence we deduce that 

E £ sup i |(^U,e)| = E e -U SU p |G e ([/)| <J d (28) 

UGS n yjn ugs V n 

with constants independent of X , so that the result follows from applying 
Markov’s inequality. ■ 

STEP II: 

Define the estimator 


§ '=o+~Y xiY i= 9 +u i d)- 
1 2=1 

Then we can write, using U T 'yr](Id)U = 7 V (U), 

u T e’u - u T eu = u T (o + %(i d ))u - u T (o + rj)u 
= %(t/)-7 V (U), 

and from the previous lemma we conclude, for any unit vector U that with 
probability >1 — 5, 

\u T e’u-u T eu\<v n . 

Let now 9 be any symmetric positive definite matrix such that 

| U T OU - U t 9’U\ < v n . 

Such a matrix exists, for instance 9 £ R + (k), and by the triangle inequality we 
also have 

\U t 9U - U t 9U\ < 2v n . (29) 
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Lemma 3 Let M be a symmetric positive definite dx d matrix with eigenvalues 
X j’s ordered such that Ai > A 2 > ... > A d- For any j < d consider an arbitrary 
collection of j orthonormal vectors Vj = ( V L : 1 < 1 < j) in R d . Then we have 

a) Aj + i < sup U T MU, 

U £<S ,U-Lspan(Vj) 

and 

h) ^A t > 

!■<] 

The proof of this lemma is basic and given in the appendix. Let now R be the 
rotation that diagonalises 6 such that R T 6R = diag(Xj : j = 1,..., d) ordered 
such that Xj > A 7+ i Vj. Moreover let R be the rotation that does the same for 
9 and its eigenvalues Xj. We apply the previous lemma with M = 6 and V equal 
to the column vectors r L : t < l — 1 of R to obtain, for any fixed l < j < d, 



A i < 

sup U t 9U , 

(30) 



U eS ,U±span(r b ,L<l—l) 


and also that 


M 

IV 

M 

(31) 



Vi 

■Ol 

V! 


From (29) we 

deduce, that 




X 1 < sup U t 9U + 2v n = Xj + 2v n V l < j, 

U(zS ,U -Lspan(r L ,L<j—l) 


as well as 

^2 Xi > ^2 rf Or 1 - 2 jv n = ^ A; - 2 jv n , 

i<j i<j i<j 

with probability > 1 — 8. Combining these bounds we obtain 


J2~ Xl ~J2 Xl 

l<:i i<j 


< 2jv n , 


j < d. 


(32) 


STEP III 

We show that the confidence sets covers the true parameter on the event of 
probability > 1 — <5 on which Steps I and II are valid, and for the constant C 
chosen large enough. 

Let II = n fl+ , 2 ^ be the projection operator onto R + (2k). We have 

||tf-0|| Sl < ||'f?-n<?|| 5l + ||n0-0|| Sl . 
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We have, using (32) and Lemma 5 below 

line-fills, = E E 

J>2k J<2k 

< 1 — ^2 X J + rkv n 

J<2k 

< 6v n k < (C/2)\/kr n (k) 

for C large enough. 

Moreover, using the oracle inequality (24) with S' = 110 and (25), 

||fi-n 0 || Sl < v^ii^-nfiUjr 

< Vd;{\\S-e\\ F + \\ue-e\\ F ) 

< V4^(||l? — 0Pilot ||F + ||fipilot — 0 ||f + ||nfi — 0 ||f) 

< Vl(r n (k) + \\m-d\\ F ). 

We finally deal with the approximation error: Note 

l|nfi-fi ||f =J2 X ^ < max |A Z | |A ; |. 

. l>2k 

l>2k l>2k 

By (32) we know that 

5> = 1-£a z <1-5> + 2 v n k < 4 v n k. 

l>k l<k l<k 

Hence out of the A/ ’s with indices l > k there have to be less than k coefficients 
which exceed 4u n . Since the eigenvalues are ordered this implies that the A;’s 
with indices l > 2k are all less than or equal to 4v n , and hence the quantity in 
the last but one display is bounded by (since k < 2k), using again (32) and the 
definition of k, 

4^n ( 1 - ^ |Aj| J < V n I 1 - |Az| j +kv„< < Vkr n (k). 

\ l<k ) \ l<k ) 

Overall we get the bound 

||fi - nfi|| Sl < kv n < {C/2)\flr n {k) 

for C large enough, which completes the proof of coverage of C n by collecting the 
above bounds. The diameter bound follows from k < k (in view of the defining 
inequalities of k being satisfied, for instance, for 9' = 0, whenever 0 £ R + (ko).) 
We conclude with the following auxiliary results used above. 
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Lemma 4 Under the RIP (6) we have for every 1 < k < d that, with probability 
at least 1 — <5, 


sup 

A,BGR(k) 


±{XA,XB)-{A,B) F 

U\\f\\B\\ f 


< 10 T n (k). 


Proof. The matrix RIP can be written as 


sup 

AeR{k) 


(XA, XA) 

n(A, A) f 


{(A^n^M -I)A) f \ 
( A,A) F 


< T n (k), 


(33) 


(34) 


for a suitable M £ 11^2 (C). The above bound then follows from applying the 
Cauchy-Schwarz inequality to 

— (XA, XB) - (A, B) F = (AAn^M -1)B) f . (35) 

n 


The proof of the following basic lemma is left to the reader. 

Lemma 5 Let M > 0 with positive eigenvalues (A j)j ordered in decreasing 
order. Denote with the projection onto R + (j — 1) = R(j — 1) D 0 + . 

Then for any 2 < j < d we have 

Y J ^' = \\M-A R+{j - 1) M\\ Sl . 

j’>j 

Acknowledgements. This work has been supported by the EU (SIQS, 
RAQUEL), the ERG (TAQ, UQMSI) and the DFG (SPP1798, MuSyAd Emmy 
Noether grant). AC worked on this project while a postdoc at the University of 
Cambridge. We also acknowledge discussions with C. Riofrio. 


References 

[ 1 ] L. Artiles, R. Gill, and M. Guta. An invitation to quantum tomography. 
J. Roy. Statist. Soc., 67:109, 2005. 

[2] K. M. R. Audenaert and S. Scheel. Quantum tomographic reconstruction 
with error bars: a Kalman filter approach. New J. Phys., 11(2):023028, 
2009. 

[31 R. Blume-Kohout. Robust error bars for quantum tomography, 2012. 
arXiv: 1202.5270. 

[4] A.D. Bull and R. Nickl. Adaptive confidence sets in L 2 . Probability Theory 
and Related Fields, 156:889-919, 2013. 

[5] E. J. Candes and Y. Plan. Tight oracle inequalities for low-rank matrix 
recovery from a minimal number of noisy random measurements. IEEE 
Trans. Inform. Theory, 57(4):2342-2359, 2011. 

[ 6 ] A. Carpentier and A. Kim. An iterative hard thresholding estimator for low 
rank matrix recovery with explicit limiting distribution. arXiv:1502. Of 65 j, 
2015. 


imsart-generic ver. 2011/11/15 file: ellllowrankrevision.tex date: December 22, 2015 









A. Carpentier, J. Eisert, D. Gross and R. Nickl/Uncertainty Quantification for Matrix C3SL 


[7] A. Carpentier and R. Nickl. On signal detection and confidence sets for 
low rank inference problems. Electronic J. Stat., to appear, 2015. 

[8] M. Christandl and R. Renner. Reliable quantum state tomography. Phys. 
Rev. Lett., 109:120403, 2012. 

[9] R. De Eq. A brief introduction to Fourier analysis on the Boolean cube. 
Theo. Comp., 1:1-20, 2008. 

[10] S. T Flammia, D. Gross, Y.-K. Liu, and J. Eisert. Quantum tomogra¬ 
phy via compressed sensing: error bounds, sample complexity and efficient 
estimators. New J. Phys., 14(9):095022, 2012. 

[11] E. Gine and R. Nickl. Confidence bands in density estimation. Ann. Statist., 
38:1122 1170, 2010. 

[12] E. Gine and R. Nickl. Mathematical foundations of infinite-dimensional 
statistical models, to appear, Cambridge University Press, 2015. 

[13] D. Gross. Recovering low-rank matrices from few coefficients in any basis. 
IEEE Trans. Inf. Th., 57(3):1548-1566, 2011. 

[14] D. Gross, Y.-K. Liu, S. T Flammia, S. Becker, and J. Eisert. Quantum 
state tomography via compressed sensing. Phys. Rev. Lett., 105(15):150401, 
2010 . 

[15] M. Guta, T. Kypraios, and I. Dryden. Rank-based model selection for 
multiple ions quantum tomography. New J. Phys., 14:105002, 2012. 

[16] H. Haeffner, W. Haensel, C. F. Roos, J. Benhelm, D. C. al Kar, M. Chwalla, 
T. Koerber, U. D. Rapol, M. Riebe, P. O. Schmidt, C. Becher, O. Glihne, 
W. Dur, and R. Blatt. Scalable multi-particle entanglement of trapped 
ions. Nature, 438:643, 2005. 

[17] M. Hoffmann and R. Nickl. On adaptive inference and confidence bands. 
Ann. Statist., 39:2382-2409, 2011. 

[18] A. S. Holevo. Statistical structure of quantum theory. Springer, 2001. 

[19] Y. I. Ingster, Tsybakov A. B., and N. Verzelen. Detection boundary in 
sparse regression. Elec. J. Stat.., 4:1476-1526, 2010. 

[20] A. Javanmard and A. Montanari. Confidence intervals and hypothesis test¬ 
ing for high-dimensional regression. J. Mach. Learn. Res., 15(l):2869-2909, 
2014. 

[21] V. Koltchinskii. Von Neumann entropy penalization and low-rank matrix 
estimation. Ann. Statist., 39(6):2936-2973, 2011. 

[22] V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear-norm penaliza¬ 
tion and optimal rates for noisy low-rank matrix completion. Ann. Statist., 
39(5):2302-2329, 2011. 

[23] U. Leonhardt. Measuring the quantum state of light. Cambridge University 
Press, Cambridge, 2005. 

[24] K.-C. Li. Honest confidence regions for nonparametric regression. Ann. 
Statist., 17:1001-1008, 1989. 

[25] Y.-K. Liu. Universal low-rank matrix recovery from Pauli measurements. 
In Adv. Neur. Inf. Proc. Sys., pages 1638-1646, 2011. 

[26] R. Nickl and S. van de Geer. Confidence sets in sparse regression. Ann. 
Statist., 41(6):2852-2876, 2013. 

[27] M. A. Nielsen and I. L. Chuang. Quantum computation and quantum in- 


imsart-generic ver. 2011/11/15 file: ellllowrankrevision.tex date: December 22, 2015 



A. Carpentier, J. Eisert, D. Gross and R. Nickl/Uncertainty Quantification for Matrix (32 


formation. Cambridge University Press, Cambridge, 2000. 

[28] A. Peres. Quantum theory. Springer, Berlin, 1995. 

[29] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions 
of linear matrix equations via nuclear norm minimization. SIAM Rev., 
52:471, 2010. 

[30] J. Robins and A.W. van der Vaart. Adaptive nonparametric confidence 
sets. Ann. Statist., 34:229-253, 2006. 

[31] J. Shang, H. K. Ng, A. Sehrawat, X. Li, and B.-G. Englert. Optimal error 
regions for quantum state estimation. New J. Phys., 15(12):123026, 2013. 

[32] A. Smith, C. A. Riofrio, B. E. Anderson, H. Sosa-Martinez, I. H. Deutsch, 
and P. S. Jessen. Quantum state tomography by continuous measurement 
and compressed sensing. Phys. Rev. A, 87:030102(R), 2013. 

[33] K. Temme and F. Verstraete. Quantum chi-squared and goodness of fit 
testing. J. Math. Phys., 56(1):012202, 2015. 

[34] S. van de Geer, P. Buhlmann, Y. Ritov, and R. Dezeure. On asymptotically 
optimal confidence regions and tests for high-dimensional models. Ann. 
Statist., 42(3):1166-1202, 2014. 

[35] C.H. Zhang and S. S. Zhang. Confidence intervals for low dimensional 
parameters in high dimensional linear models. J. R. Stat.. Soc. Ser. B. 
Stat. Methodol., 76(1):217 242, 2014. 


6. Appendix 

6.1. Pauli spin measurements & Quantum Tomography 

This work was partly motivated by a problem arising in present-day physics 
experiments that aim at estimating quantum states. Conceptually, a quantum 
mechanical experiment involves two stages: A source (or preparation procedure ) 
that emits quantum mechanical systems with unknown properties, and a mea¬ 
surement device that interacts with incoming quantum systems and produces 
real-valued measurement outcomes, e.g. by pointing a dial to a value on a scale. 
Quantum mechanics stipulates that both stages are completely described by 
certain matrices. 

The properties of the source are represented by a positive semi-definite unit 
trace matrix 9, the quantum state, also referred to as density matrix. In turn, 
the measurement device is modelled by a Hermitian matrix X, which is referred 
to as an observable in physics jargon. A key axiom of the quantum mechanical 
formalism states that if the measurement X is repeatedly performed on systems 
emitted by the source that is preparing 9, then the real-valued measurement 
outcomes will fluctuate randomly with expected value 

(X,9) F = tr(X0). (36) 

The precise way in which physical properties are represented by these matrices 
is immaterial to our discussion (cf. any textbook, e.g. ref. [28]). We merely note 


imsart-generic ver. 2011/11/15 file: ellllowrankrevision.tex date: December 22, 2015 


A. Carpentier, J. Eisert, D. Gross and R. Nickl/Uncertainty Quantification for Matrix C2B 


that, while in principle any Hermitian X can be measured by some physical 
apparatus, the required experimental procedures are prohibitively complicated 
for all but a few highly structured matrices. This motivates the introduction of 
Pauli designs below, which correspond to fairly tractable ‘spin measurements’. 

The quantum state estimation or quantum tomography 1 problem is to esti¬ 
mate an unknown density matrix 6 from the measurement of a collection of 
observables X 1 ,..., X n . This task is of particular importance to the young field 
of quantum information science [27]. There, the sources might be carefully en¬ 
gineered components used for technological applications such as quantum key 
distribution or quantum computing. In this context, quantum state estimation 
is the process of characterising the components one has built - clearly an im¬ 
portant capability for any technology. 

A major challenge lies in the fact that relevant instances are described by d x 
d-matrices for fairly large dimensions d ranging from 100 to 10.000 in presently 
performed experiments [16]. Such high-dimensional estimation problems can 
benefit substantially from structural properties of the objects to be recovered. 
Fortunately, the density matrices occurring in quantum information experiments 
are typically well-approximated by matrices of low rank r <C d. In fact, in the 
practically most important applications, one usually even aims at preparing a 
state of unit rank - a so-called pure quantum state. 

6.1.1. Pauli observables 

We now introduce a paradigmatic set of quantum measurements that is fre¬ 
quently used in both theoretical and practical treatments of quantum state esti¬ 
mation (see, e.g., refs. [14, 16]). For a more general account, we refer to standard 
textbooks [18, 27]. The purpose of this section is to motivate the ‘Pauli design’ 
case (Condition lb) of the main theorem, as well as the approximate Gaussian 
noise model described in Subsection 2.3. 

We start by describing ‘spin measurements’ on a single ‘spin-1/2 particle’. 
Such a measurement corresponds to the situation of having d = 2. Without 
worrying about the physical significance, we accept as fact that on such particles, 
one may measure one of three properties, referred to as the ‘spin along the x, y , 
or z-axis’ of R 3 . Each of these measurements may yield one of two outcomes, 
denoted by +1 and —1 respectively. 

The mathematical description of these measurements is derived from the 
Pauli matrices 


' 0 

1 ' 

9 

' 0 

—i 

3 

' l 

o 1 

1 

0 

II 

b 

i 

0 

, <7 = 

0 

-1 

i-H 


1 The term ‘tomography’ goes back to the use of Radon transforms in early schemes 
for estimating quantum states of electromagnetic fields [23, 1]. It has become synonymous 
with ‘quantum density matrix estimation’, even though current methods applied to quan¬ 
tum systems with a finite dimension d have no technical connection to classical tomographic 
reconstruction algorithms. 
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in the following way. Recall that the Pauli matrices have eigenvalues =bl. For 
x £ {1,2,3} and j £ {+1,-1}, we write ip x for the normalised eigenvector of 
a x with eigenvalue j. The spectral decomposition of each Pauli spin matrix can 
hence be expressed as 

a x (38) 

with 

-Kl = V’iW’i)* > 0 (39) 

denoting the projectors onto the eigenspaces. Now, a physical measurement of 
the ‘spin along direction x' on a system in state 9 will give rise to a { — 1,1}- 
valued random variable C x with 

P (C x = j) = tr (t t x 9) , (40) 

where 9 £ H 2 (C). Using eq. (38), this is equivalent to stating that the expected 
value of C x is given by 

E(C X ) = tr ( a x 9 ). (41) 


Next, we consider the case of joint spin measurements on a collection of N 
particles. For each, one has to decide on an axis for the spin measurement. Thus, 
the joint measurement setting is now described by a word x = (aq,..., xjv) £ 
{1, 2,3}^. The axioms of quantum mechanics posit that the joint state 9 of the 
N particles acts on the tensor product space (C 2 )®^, so that 9 £ H 2 jv(C). 

Likewise, the measurement outcome is a word j = (j i,..., jw) £ {1, —1}^, 
with ji the value of the spin along axis aq of particle i = 1,... ,7V. As above, 
this prescription gives rise to a {1, — lj^-valued random variable C x . Again, the 
axioms of quantum mechanics imply that the distribution of C x is given by 

P (C x = j) = tr ((4 1 ® • • • ® 7 r^)0) . (42) 

Note that the components of the random vector C x are not necessarily inde¬ 
pendent, as 9 will generally not factorise 

It is often convenient to express the information in eq. (42) in a way that 
involves tensor products of Pauli matrices, rather than their spectral projections. 
In other words, we seek a generalisation of eq. (41) to N particles. As a first 
step toward this goal, let 


xU) 


— 1 number of —1 elements in j is odd 

1 number of —1 elements in j is even 


(43) 


be the parity function. Then one easily verifies 


tr((a x '®---®a x »)e)= ]T xti) tr (0(tt£ 0 • • • ® tt^)) = E( X (C“)). 

je{i ,-i} N 

(44) 

In this sense, the tensor product a Xl ® ■ ■ • ® a XN describes a measurement of 
the parity of the spins along the respective directions given by x. 
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In fact, the entire distribution of C x can be expressed in terms of tensor 
products of Pauli matrices and suitable parity functions. To this end, we extend 
the definitions above. Write 


a 


o 


1 0 
0 1 


(45) 


for the identity matrix in M 2 (C). For every subset S of {1,...,IV}, define the 
‘parity function restricted to S” via 


Xs(j) = { 


-1 

1 


number of —1 elements ji for i £ S is odd 


number of —1 elements j t for i £ S is even. 
Lastly, for S C {1,..., N} and x £ {1, 2, 3}^, the restriction of x to S is 


xf = 


Xi 

0 


i £ S 
i # S. 


(46) 


(47) 


Then for every such x, S one verifies the identity 

tr((a x " ® •• • ® o*»)0) = E ( X s(C x )). (48) 

In other words, the distribution of C x contains enough information to compute 
the expectation value of all observables ( a Xl ®- • -ig)cr XN ) that can be obtained by 
replacing the Pauli matrices on an arbitrary subset S of particles by the identity 
er°. The converse is also true: the set of all such expectation values allows one 
to recover the distribution of C x . The explicit formula reads 

HC x =j)=^ £ Xs(jMxs(C x )) (49) 

SC{1,...,JV} 

= ^n £ Xs{j)tr(6(a x ° ®---®a x »)) 


and can be verified by direct computation. [Note that E(\ 5 (C' :c )) is effectively 
a Fourier coefficient (over the group ) of the distribution function of the 
{—1, 1} ,v -valued random variable C x . Equation (49) is then nothing but an 
inverse Fourier transform.] 

In this sense, the information obtainable from joint spin measurements on N 
particles can be encoded in the 4 W real numbers 

2~ N / 2 tr{{a Vl ®---®a VN )0), y e {0,1,2,3}*. (50) 

Indeed, every such y arises as y = x s for some (generally non-unique) combi¬ 
nation of x and S. This representation is particularly convenient from a math¬ 
ematical point of view, as the collection of matrices 

E v :=2- N/2 a Vl ®---®a VN , ye {0,1,2,3}* (51) 
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forms an ortho-normal basis with respect to the (•, -)p inner product. Thus the 
terms in eq. (50) are just the coefficients of a basis expansion of the density 
matrix 9. 2 


6.1.2. From (50) to Condition lb) 


Following [10] we use eq. (50) as our model for quantum tomographic measure¬ 
ments. Note that the E v satisfy Condition lb) with coherence constant K = 1 
and d = 2 N . In the model (1) under Condition lb) we wish to approximate 
d ■ tr(E y 9) for a fixed observable E y (we fix the random values of the A' l ’s here) 
and for d = 2 N . If y = x s for some setting x and subset S, then the parity 
function B y := Xs{C x ) has expected value 2 N f 2 ■ tr{E v 9) = \fd ■ tr(E y 9) (see 
eqs. (48) and (51)), and itself is a Bernoulli variable taking values {1, —1} with 


p = P(B y = 1 ) 


1 + \fdtr{E v 9) 
2 


Note that 

Vd\tr{E y 9)\ < Vd\\E y \\ op \\e\\ Sl < 1, 

so indeed p £ [0,1] and the variance satisfies 


Var B y = 1 -d-tr(E y 9) 2 < 1. 


This is precisely the error model described in Subsection 2.3. 


6.2. Proof of Lemma 3 

a): Consider the subspaces E = span((V L ) L <j)- L and F = span((e L ) L <j + \) of 
R d , where the efs are the eigenvectors of the d x d matrix M corresponding 
to eigenvalues Xj. Since dim(U) + dim(F) = (d — j) + j + 1 = d + 1, we know 
that E p| F is not empty and there is a vectorial sub-space of dimension 1 in 
the intersection. Take U £ E f] F such that ||(7|| = 1. Since U £ F, it can be 
written as 

l+i 

u = ^2 

L= 1 

for some coefficients u L . Since the e t ’s are orthogonal eigenvectors of the sym¬ 
metric matrix M we necessarily have 


l+i 

MU = ^2 

L= 1 

2 We note that quantum mechanics allows to design measurement devices that directly 
probe the observable of <j Vi (g>- • -<S>o- yN , without first measuring the spin of every particle and 
then computing a parity function. In fact, the ability to perform such correlation measurements 
is crucial for quantum error correction protocols [27]. For practical reasons these setups are 
used less commonly in tomography experiments, though. 
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and thus 


3 +1 

U T MU = ^X L u 2 . 

t = 1 


Since the A, 's are all non-negative and ordered in decreasing absolute value, one 
has 

3 +1 3 + 1 

U t MU = > A J+ 1 5] u 2 = X ]+1 \\U\\ 2 = X j+1 . 

i=i i=i 


Taking the supremum in U yields the result. 

b): For each t < j, let us write the decomposition of V L on the basis of 
eigenvectors (ei : l < d) of M as 


l<d 

Since the (e/) are the eigenvectors of M we have 

Y,(V°) T MV 1 = EE A 'W) 2 ’ 

i<j i<j ;=i 

where T,t=M) 2 = 1 and £ t <jK) 2 < 1, since the V L are orthonormal. The 
last expression is maximised in (Vi)i<j,i<l<d and under these constraints, when 
v[ = 1 and v[ = 0 if i ^ l (since the (A t ) are in decreasing order), and this gives 

Y^{v c ) t mv 1 < ^ A t . 

3 *■<3 
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